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Abstract - The more than thirty years old issue of the (classical) information capacity 
of quantum communication channels was dramatically clarified during the last years, when a 
number of direct quantum coding theorems was discovered. The present paper gives a self 
contained treatment of the subject, following as much in parallel as possible with classical 
information theory and, on the other side, stressing profound differences of the quantum case. 
An emphasis is made on recent results, such as general quantum coding theorems including 
cases of infinite (possibly continuous) alphabets and constrained inputs, reliability function for 
pure state channels and quantum Gaussian channel. Several still unsolved problems are briefly 
outlined. 



I. Introduction 



The issue of the information capacity of quantum communication channels arose in the 
sixties (see, in particular, \ Gor 62j\ ,\ For 63j\ , \ Leb 63,~66 \, \ Gor 64 \ and more references in 



the survey \ \Cav 94\\ ) and goes back to even earlier classical papers of Gabor and Brillouin, 
asking for fundamental physical limits on the rate and quality of information transmis- 
sion. This work laid a physical foundation and raised the question of consistent quantum 
information treatment of the problem. Important steps in this direction were made in the 



seventies when quantum statistical decision (detection and estimation) theory \Hel 76 
Hoi 76\ was created, making a quantum probabilistic frame for this circle of problems. At 



that time the quantum entropy bound and strict superadditivity of classical information 
in quantum communication channels were established \\Hol 73 , \\Hol 79\ . 

A substantial progress has been achieved during the past two years, when a number of 
direct quantum coding theorems was discovered, proving the achievability of the entropy 
Hoi 96\\ , ||Sch 97\\ . To considerable extent this was stimulated by an 



bound \Hau 96 



interplay between the quantum communication theory and quantum information ideas 
related to more recent development in quantum computing (see e.g. \\Ben 97| ). The 
question of information capacity is important in the theory of quantum computer, which 
is a highly specific information processing device, particularly in connection with quantum 



error-correcting codes [CaJ 96} , \Ste 97 



In this paper we discuss transmission of classical information through quantum chan- 
nels. Remarkably, important probabilistic tools underlying the treatment of this case 
have their roots, and in some cases direct prototypes, in classical Shannon's theory, as 
presented in particular in ||GaJ 68j , \\Cov 91\\ . The paper is intended to give a self con- 
tained and rigorous treatment of the subject, following as much in parallel as possible 
with classical information theory and, on the other side, stressing profound differences of 
the quantum case. An emphasis is made on recent advances, and several still unsolved 
problems are briefly outlined. 

There is yet "more quantum" domain of problems concerning reliable transmission of 
entire quantum states under a given fidelity criterion \\Ben 97\\. The very definition of the 



relevant "quantum information 



were made in 



is far from obvious. Important steps in this direction 
UiV 98 \, where in particular a tentative converse of the relevant 
coding theorem was suggested. However the proof of the corresponding direct theorem 
remains an open question. 



II. General considerations 
§1. The quantum communication channels 

A communication channel in general can be described as an affine mapping which 
transforms states of the input system into states of the output system. States represent 
statistical ensembles that can be mixed, and the affinity property reflects fundamental 
requirement of preservation of the statistical mixtures. In case of classical systems states 
are described by probability distributions, and classical communication channel is just a 
transition mapping from input to output probability distributions. If at least one of the 
systems is quantum, one speaks of quantum communication channel. 

Let H be a Hilbert space providing a quantum-mechanical description for the physical 
carrier of information. We do not ask 7i to be finite-dimensional, as in quantum com- 
munication this may well be not the case (while in applications to quantum computing 
finite dimensions always suffice). We shall not dwell upon topological questions (unless 
this is a matter of principle as in §IV.2), and the convergence of operator series below is 
usually to be understood in the weak operator sense (although in some cases it is in fact 
stronger, say in the norm sense). 

A quantum state is a density operator, i. e. positive operator S in Ti with unit 
trace, TrS = 1. Following Dirac's formalism, we shall denote vectors of 7i as and 
hermitean conjugate vectors of the dual space as (-01. Then (<f)\ip) is the inner product 
of |0) and |'0)(0| is the outer product, i. e. operator A of rank 1, acting on vector 
\x) as A\x) = |"0)(0|x)- If 1^) is a umt vector, then is the orthogonal projection 

onto \ip). This is a special density operator, representing pure state of the system. Pure 
states are the extreme points of the convex set S{TL) of all states; an arbitrary state can 
be represented as a mixture of pure states, i. e. by imposing classical randomness on 
pure states. In this sense pure states are "noiseless" , i. e. they contain no classical source 
of randomness. By the spectral theorem, every density operator can be represented as a 
mixture of pure states 

s = Y^\i\if)i)(ij)i\, 

i 

where Aj are the eigenvalues, \ipj) are the eigenvectors of S. Note that {Aj} form a 
probability distribution i.e. a classical state on the set of eigenvectors of S. This also 
means that classical states can always be embedded into S {Ti) by fixing some orthonormal 
system {(V'j)} m 

The following notion of quantum decision rule is a far-reaching generalization of the 
standard notion of observable. Mathematically it is described by a resolution of identity 
in Ti, that is by a family X = {Xj} of positive operators in Ti satisfying J2j Xj = I, 
where / is the unit operator in Ti . The probability of taking a decision j if the decision 
rule X is applied to system in the state S is postulated by the following generalization of 
the Born statistical formula: 

P(j\S) = TrSX r 



From a physical point of view, a decision rule is implemented by a quantum measurement 
including possible posterior processing of the measurement results (see \ tiol 8(\ , \ Kra 8c 



for more discussion). The standard notion of observable is recovered if one requires Xj to 
be mutually orthogonal projection operators, XjXk = Sj^Xj. The mapping S — > P{-\S) 
is affine and it can be shown that any affine mapping from quantum states to probability 
distributions has this form (see HHbi 8Q\ , Proposit ion 1.6.1). In fact, it is already an 



example of quantum channel ( q-c channel, see below). A system of vectors in 

7i is called overcomplete if J2j = I- Every overcomplete system (in particular 

every orthonormal basis) gives rise to the decision rule X for which Xj = \<f)j)(<f)j\ and 

P(j\S) = {<p 3 \s<p 3 ). 

The classical case is embedded into this picture by assuming that all operators in 
question commute, and hence are diagonal in some basis labelled by index a?; in fact by 
taking S = diag[S(u)], Xj = diag[X(j|u;)], we have the classical state S and the classical 
decision rule X, such that P(j\S) = XQl^Siui). Standard quantum observables 
correspond then to classical deterministic decision rules (random variables). 

The earliest mathematical definitions of quantum communication channel \\Ech 6% 



described it essentially as an affine mapping of the convex set S(7i). One sees easily 
that any such mapping $ is a restriction to the set of quantum states of a positive linear 
trace preserving mapping of the space of trace-class operators, and vice versa. However 
later it became clear that such a definition should be substantially narrowed by imposing 
the fundamental condition of complete positivity \fiol 72\ , [[Kra 83j , \\Lin 73-751 . A linear 



mapping $ is completely positive if for any finite collections of vectors C 7i 



(this is only one of possible equivalent definitions). It turns out that this property is 
necessary and sufficient for physical realizability of the channel via unitary interaction 
with another quantum system (the environment) [ Kra 83j , \ Lin 73- Wj . Basing on a fun- 



damental result of Stinespring one shows that arbitrary linear completely positive trace 
preserving mapping can be written (non-uniquely) in the form 

m=^v k sv: (2) 
k 

with J2k Vk^k = I- We shall call any such mapping a channel []. 

We now introduce an important class of channels. Let {Si} be a family of quantum 
states and {Xi} a resolution of identity in Ti. Let 

$[S\= J ES i TtSX i . (3) 

i 

It is easy to check that this is linear completely positive trace preserving mapping; it 
is a good exercise to find a representation (0) for such $. If Xj = | e^) (e^ | , where {e^} 
is an orthonormal basis, we call it classical- quantum (c-q) channel. As easily seen, it 
is equivalent to giving a mapping i — ► Si from classical input alphabet A = {i} to 



1 A recent paper \Fuj 98 \ presents an attempt to investigate the capacity of channels given by positive 



but not completely positive maps. Such attempts may be interesting in view of recent observation 



[Kos 97 that such channels might be realizable via interactions with more sophisticated environment (as 



non-Abelian gauge field) described by operators in a graded tensor product of Hilbert spaces. 



quantum states. If, moreover, all states S{ commute the channel is called quasiclassical; 
such channel is equivalent to a classical channel with transition probability, given by the 
eigenvalues S(u\i) of the states S^. 

On the other hand, if Xi are arbitrary and Si = | e^) (e^ | , we call the channel quantum- 
classical (q-c channel), as it is equivalent to giving a decision rule that maps quantum 
states into probability distributions on the output alphabet B = {i}. The channels of the 
form (^) by no means exhaust all possibilities; the simplest example of a channel which 
is not of the form (|3|) is given by reversible evolution 

$[S] = VSV*, (4) 

where V is arbitrary unitary operator. 



§2. The entropy bound and the capacity of quantum channel 

If 7T is a discrete probability distribution on S(7i), assigning probability 7Tj to the state 
Si, we denote 

AH(n) = (5) 

i 

where 

Sn = ^2-n~iSi, (6) 

i 

and 

H{S; S') = TrS(log S - log S') 

is quantum relative entropy (see \\Lin , \\Weh 7E\ , %)hy 93\ for more careful definition 

and discussion of the properties). Just as the relative entropy, the quantity AH(n) is 
nonnegative but may be infinite. If 

sup H(Si) < oo, (7) 

i 

where H(S) = — TrSlogS is the quantum entropy, then 

AH(n) = H(S 7T )-H(S { . ) ), (8) 

where H(Sr.)) = J^i^iH^Si) < oo. 

Let X = {Xj} be a decision rule, and let P(j\i) = TrSiXj. We denote by 

the classical mutual information between input and output random variables. The quan- 
tum entropy bound says that 

sup/(7r,X) < AH(w), (10) 

X 

with the equality achieved if and only if all the operators commute. The inequality 
was explicitly conjectured in \\Gor 64|| in the context of conventional quantum measure- 
ment theory. The proof for a finite number of states in finite-dimensional Hilbert space 



based on the study of convexity properties of the quantities in both sides of (|10| ) was given 



in 



A different approach to this bound is related to the strong subadditivity of quan- 
tum entropy \\Lie 73|| and equivalent property of decrease of quantum relative entropy 



under trace preserving completely positive maps developed later in the series of papers 
Lin 73-7~Bj and in \\Uhl 77] , namely 



H($(S)]$(S')) < H(S; S') 

for any states S, S' and channel This can be used to generalize the entropy bound 
to the case of infinitely many states in infinite dimensions by choosing $ to be the q-c 
channel implying the quantum decision rule (cf. \\Yue 93|| ). It is also not difficult to 



extend the initial proof given in \\Hol 73fl , but the reformulation of the entropy bound in 



terms of the relative entropy is important for a different reason: it extends to the case 
where (^) does not hold, the signal states Si can have infinite entropy, and the formula 
(§) is no longer valid. 

If $ is a channel, we denote I (it, the mutual information defined analogously to 

but with the transition probabilities given by P(j\i) = Tr<&[Si)Xj, and 

AH(ir,Q)=^ir i HWS i y,$[S ir ]). (11) 

i 

In order to consider block codes let us introduce the product channel $ lX)n = $ <S> ... <8> $ 
in the Hilbert space H® n = Ji ® ... <g> H. . Let us denote 

C n ($) = sup sup I {it, X); C n ($) = sup A#(tt, (12) 

7T X It 

where the suprema are taken over all discrete probability distributions tc on S(H,® n ), 
and over all decision rules X in 7i® n . It is easily seen, by taking product probability 
distributions tt, that the quantities C n ($),C n (&) are super additive: 

C n ($) + C m ($) < C n+m ($), C n ($) + C m (*) < c n+m (<s>). 

This implies that the following limits exist 

C($) = Jim C„($)/n = sup C„($)/n, (13) 

C($) = lim C n (<S>)/n = supC' n ($)/n. (14) 
The entropy bound implies 

C($) < C{$). 

We call the quantity C($) the capacity of the channel $. This definition is naturally 
justified by an application of the classical Shannon coding theorem (cf. \ Hol 79{ \), but 



we shall give a different argument implying also (under some regularity conditions) much 
stronger statement 

C($) =£($). (15) 

For a classical channel C n ($) = nCi($) is additive, and trivially C($) = = 
Ci($). A striking feature of quantum case is possibility of the inequality Ci($) < C($) 



implying strict superadditivity of the information quantities C n ($) (see §111.2,3). In a 
sense, there is a kind of "quantum memory" in channels, which are the analog of classical 
memoryless channels. This fact is just another manifestation of the "quantum nonsepa- 
rability" , and in a sense is dual to the existence of Einstein - Podolsky - Rosen correla- 
tions: the latter are due to entangled (non-factorizable) states and hold for disentangled 
measurements while the superadditivity is due to entangled measurements and holds for 
disentangled states \\Hol 79j\ , \\Per 91\ . 

The paper \\Ben 96j\ raised the general question of (super) additivity of the quantities 
(7 n ($). If they are additive then 

= = supAF(7r,$) 



which further greatly simplifies calculation of the capacity. This trivially holds for re- 
versible channels (J|). The following Proposition shows that this is also true in somewhat 
opposite cases. 

Proposition 1. If $ is c-q or q-c channel, then 

C n ($)+C m (®) = C n+m ($). 

Proof. It is sufficient to show that 



+c 1 {$) >c 2 {<&). 



If $ is a c-q channel, 



^[S] = J2S l (e i \Se l ), 

i 

where Si are fixed states in Ti, then 

sup AH (tt, $) = supAif(7r) 



(16) 
(17) 



where AH (it) is given by the expression (^j) with these fixed states Si. Let the distribution 
n assign the probability tt^ to the state Si <8> Sj in H <g> H. We have 



AH(n) < AHin 1 ) + AH(n 2 ), 



18) 



where 7T 1 is the first marginal distribution of it assigning probability ix] = J2j ^ij t° t ne 
state Si in 7i, and similarly it 2 is the second marginal distribution assigning probability 
J2i ^ij to the state Sj in 7i. In finite dimensional case where formula (Bp always holds, 



TT : 



this follows from subadditivity of the entropy with respect to tensor products ||Weh 78 



(see the proof of Lemma 2 in Appendix of [Ha? 9t \). In infinite dimensional case let us 
consider a monotonously increasing sequence of orthogonal projections P r f I in 7i, and 
introduce 

AH r (Tr) = J2^H(P r S l P r ;P r S n P r ). 



By the properties of relative entropy \\Lin 73-7!% , AH r {rr) \ AH (it). By using (IT 
normalized projected states, we obtain 



for 



A r H(n) < AH^tt 1 ) + AH t (tt 2 ) - <p(Tr(P r ® P r )S n (P r ® P r )), 



where <j){x) = —x logs. Passing to the limit r — > oo gives (|18|) in the general case. Taking 
in ([IB]) supremum over ir gives (Pp. 
Now let $ be a q-c channel 

$[5]=^TrSX J |e,)(e J |, (19) 

and let 7r be a discrete probability distribution on S(H), assigning probability ir k to a 
state S^, then the density operators <&[S k ) commute and 

AH(n,$) = I(1C;J), 

is the classical mutual information (|9]) corresponding to the input probability distribution 
7r and transition probability P(j\k) = TrS k Xj. Here we denote by K the input random 
variable taking values k, and by J the output random variable taking the values j. In 
order to prove (|16|) , consider states S k in the Hilbert space Ti ® Ti,, then the transition 
probability is 

PVij2\k)=TrS k (X jl ®X :k )=P 1 (j 1 \j 2 ,k)P 2 (j 2 \k), (20) 

where 

P l (h\j 2 ,k) = TriS^Xfr, P 2 (j 2 \k) = Tr 2 S 2 k X j2 , 

and 

Here we denote by Tr r (partial) trace with respect to r-th factor (r = 1, 2) in Ti eg) 7Y. 
We then have 

AH(n, $ ® $) = J(/C; JiJb) = H{JxJ 2 ) - H{J X J 2 \K), 

where #(•), H(-\-) are, respectively, classical entropy and conditional entropy of the ran- 
dom variables. By subadditivity of the classical entropy, 

h{j x j 2 )<r{j x )+r{j 2 ). 

On the other hand, (|20|) implies 

H{JxJ 2 \K) = HfrUiK) + H{J 2 \K). 

Combining, we get 

I(K\ J1J2) < I(JCJ 2 ; J x ) + I{K; J 2 ), 

which amounts to 

AH (ir, $ ® $) < AH(n\ $) + A#(tt 2 , $), 

where 7T 1 is the probability distribution on S(TC), assigning probability n k P 2 (j 2 \k) to the 
state Sj k and tc 2 is the probability distribution on S(Ti), assigning probability Hk to the 
state S k . Taking supremum over 7r gives flTEp. □ 



3. Formulation of the quantum coding theorem. The weak converse 



We call by code of size M a sequence (S 1 , X\ ),..., (S M , Xm), where S k are states, 
and {Xk} is a family of positive operators in Ji® n , satisfying J2k=iXk < I- Defining 
X = I — J2kXk, we have a resolution of identity in Tt® n . An output k > 1 means 
decision that the state S k was transmitted, while the output is interpreted as evasion 
of any decision. The average error probability for such a code is 

1 M 

KS,X) = -j:\l-^\S k }X k }. (21) 

Let us denote p(n, M) the infimum of this error probability with respect to all codes of 
the size M. 

Theorem 1. If C(<&) < oo and R > C($) , then p(n, e nR ) /> 0. On the other hand, 
let $ be a channel satisfying the condition 

sup H($[S]) < oo, (22) 

SeS(H) 

then p(n, e nR ) for R < C7($). In particular, C($) = C"($). 

Proof. The proof of the first statement is based on the inequality 

logM-(l-p(n,M)) <C n ($) + l, (23) 

which is simple corollary of the classical Fano inequality. Indeed, let J be the classical 
random variable describing the output of the product channel under the decision rule X 
if the words in the code (S, X) are taken with the input distribution tt m assigning equal 
probability 1/M to each state S k , and let K, be the random variable, the value of which 
is the number k of the transmitted state. The Fano inequality [ |Gai BE \, | |C'ov 9J| ] implies 



]ogM-(l-\(S,X))<I(JC;J)+l<C n {$) + l. 

Taking M = e nR and letting n — > oo, we come to the conclusion p(n, e nR ) -/-> 0. 

As for the second statement, here we shall show only how the proof for the general 
case reduces to the case of c-q channel Ql7|) satisfying the condition 



sup H (Si) < oo. (24) 

i 

The following Chapter will be devoted essentially to the treatment of that special case. 

If R < C'($), then we can choose no and probability distribution 7r° on S(?i® no ) such 
that n R < AH(n°,$® n °J. Let vr assign probability vr, to the state Si in H® n °, and 
consider the c-q channel $ in this Hilbert space given by the formula 

i 

According to Proposition 1, 



which is greater than n^R. Denoting p(n, M) the minimal error probability for we 
have (assuming n to be multiple of no) 



p(n,e nR ) <p(n/n ,e in/no)mR ), 

since every code of size M for $ is also code of the same size for It follows that if we 
prove the statement for the c-q channel <3>, it will be also proved for the initial channel $. 
Let us now show that the condition (|22f) implies ( p4|) for the channel $. Indeed, 

sup H($® no [Si]) < sup H{<5>® n »[S}) < n sup H{$[S\) < oo, 

i SeS(W® n o) SeS{H) 

by subadditivity of quantum entropy with respect to tensor products. □ 

From now on we shall consider c-q channel ( |TTD in the Hilbert space 7i, determined by 
the mapping i — > Si from the input alphabet A = {i} to S(H), and shall skip $ from all 
notations. For c-q channel the output states are fixed, and sending a word w = . . . , i n ) 
produces the tensor product state S w = S^ <g> . . . <8> Si n . A code of size M is a collection 
(w^fXx), (w M , Xm)-, where w k are words of lengths n. The average error probability 
of the code is 

i M 

\{W,X) = — ^[l-Tr^ & X fe ] (25) 

k=l 

In terms of our previous definition this means that the input states can be taken as 
products of the pure states 

S k = \e h }(e h \ ® . . . ® \e in )(e in \, 

where |ej) are taken from the representation (|I7D . Using more general input states S 
amounts to randomly chosen codewords, which cannot increase the rate of information 
transmission. The proof of the Theorem will be completed in §111.2, but first we discuss 
in more detail pure state channels. 



III. The proof of the direct quantum coding theorem 
§1. The pure state channels 

Let us consider a pure state channel with Si = \il>i){if)i\ ■ Since the entropy of a pure 
state is zero, the condition (|7p is trivially satisfied and AH(tt) = H(S n ) for such a channel. 
By discussing pure state channel first we shall follow historical development of the subject 
and prepare for considerably more technical treatment of the general c-q channel. Also in 
this case we can obtain more advanced results concerning the asymptotic behavior of the 
error probability and the reliability function that are still unavailable in the general case. 

For a pure state channel sending a word w = (ii, . . . , i n ) produces the tensor product 
vector ip w = ip^ ® . . . ® ipi n € TC® n . We are now interested in obtaining upper bounds 
for the error probability p(n, M) minimized over all codes of size M.The first step has 
geometric nature and amounts to obtaining a tractable upper bound for the average error 
probability 

1 M 

X(W,X) = _£[!- (^ wk \X k ^ wk )} (26) 



minimized over all decision rules. Minimization of fl26|) is the quantum Bayes problem with 
uniform apriori distribution, and its solution is a natural analog of the maximal likelihood 
decision rule. There are necessary and sufficient conditions for solution of the quantum 
Bayes problem \\HoI T$ , which, however, can be solved explicitly only in some particular 
cases, especially if the family of states has certain symmetry. It is therefore necessary to 
look for a suitable approximation of the quantum maximum likelihood decision rule. 

Let us restrict for a while to the subspace of TL® n generated by the code vectors 
ip w i , . . . , ip w M , and consider the Gram matrix V = [(Vvl^w)] an d the Gram operator 
G = Dfcli IVV^XVV 5 !- This operator has the matrix Y with respect to the overcomplete 



system 



IVv) = G 1/2 \ip w k) ; 



M 



(27) 



Following ||Hoi 7B\ consider the resolution of identity 



(28) 



which will approximate the quantum maximum likelihood decision rule ; the necessary 
normalizing factor G~ l l 2 is the source of entanglement in the decision rule (it is also 
a major source of analytical difficulties in the noncommutative case). Note that the 
vectors ip w i, . . . , ip w M need not be linearly independent; in the case of linearly independent 
coherent state vectors (|28|) is related to the "suboptimal receiver" described in \\Hel 7dj 
Sec. VI.3(e). By using this decision rule we obtain the upper bound 



inf x(w, x)<^s p (e- r 1 ^) = _L Sp (e - 



(29) 



where E is the unit M x M-matrix and Sp is the trace of M x M-matrix. Indeed, for the 
decision rule (|28|) 

i M 

\(w,x) = — Y,ii-\(^\^)\ 2 } 



AI 



k=l 



M 



2 m -2 

k=l k=l 



which is (|29|) . In deriving second relation in (29) we used Spr = SpE = M. This bound 
is "tight" in the sense that there is a similar lower bound \\HoI 78j . However it is difficult 
to use because of the presence of square root of the Gram matrix. A simpler but coarser 
bound is obtained by using the inequality 



[l-x 1 / 2 ? 



'l-xY(l + x 



1/2^ 



<(1 



X 



x > 0, 



(30) 



applied to eigenvalues of Y: 



inf \{W, X) < — Sp (E - r) 2 = — 

x M 1 M 



Tr ^2 S w r S u 



(31) 



As shown in [ [Ho J 78j\ , this bound is asymptotically equivalent (up to the factor 1/4) to 
the tight bound ( P§D in the limit of "almost orthogonal" states r — > E. On the other 
hand, different words are "decoupled" in ( pT| ) which makes it suitable for application of 
the random coding. 



Just as in the classical case, we assume that the words w 1 , w M are chosen at random, 
independently and with the probability distribution 



P{w = {i u . . .,i n )} = 7T X • . . . -7T n 
Then for each word w the expectation 



E5„ 



(32) 



(33) 



11,..., In 



and by taking the expectation of the coarse bound ([H]) we obtain, due to the independence 
of w r , w s 



p(n,M) < Einf \(W,X) < (M — l)Tr(S® 



>n\2 



(M — i)e~ nlogTr ^. 



By denoting 



C = - log inf TrSl = - log inf V vr,^ | |^-) 

7T 7T ' ■ 



(34) 



we conclude that C > C . There are cases (e. g. pure state binary channel, see below) 
where C > Ci, so this suffices to establish possibility of the inequality C > C\, and hence 
the strict superadditivity of C n \\tiol 79{\ , but not sufficient to prove the coding theorem, 
since C < C unless the channel is quasiclassical. A detailed comparison of the quantities 
Ci,C for different quantum channels was made by Ban, Hirota, Kato, Osaki and Suzuki 
Ban 96j . The quantity C was discussed in \\Hol 79j\ , ||Str 78|| , but its real information 



theoretic meaning is elucidated only in connection with reliability function and quantum 
"cutoff rate" (see \\Ban 98j ). 



The proof of the inequality C > C given in \ Hau 9t \ achieves the goal by using the 
approximate maximum likelihood improved with projection onto the "typical subspace" 
of the density operator S® n and the correspondingly modified coarse bound for the er- 
ror probability. The coarseness of the bound is thus compensated by eliminating "non- 
typical" (and hence far from being orthogonal) components of the signal state vectors. 
More precisely, let us fix small positive 5, and let Aj be the eigenvalues, \ej) the eigenvec- 
tors of S w . Then the eigenvalues and eigenvectors of 5® n are Aj = A^ ■ ... ■ A Jn , \ej) = 
je^) (g> ... (g> \ej n ) where J = (ji, ...,j n ). The spectral projector onto the typical subspace 
is defined as 

JeB 

where B = {J : e ~ n ^ H( ~ s ^ + ^ < Aj < e ~ n ^ H( - S7T ^ s] }. This concept plays a central role in 
"quantum data compression" \\Joz 94j\ . In a more mathematical context a similar notion 



appeared in \\Ohy 9$ , Theorem 1.18. Its application to the present problem relies upon 
the following two basic properties: first, by definition, 



n P\\ < e - n l H ( s -*)- s ] . 
Second, for fixed small positive e and large enough n 

TrS® n (I -P)<e, 



(35) 



(36) 



because a sequence J G B is typical for the probability distribution given by eigenvalues 
A j in the sense of classical information theory jGai 6'^| , \ \Cov 91 \. 



By replacing the signal state vectors \ip w k) with unnormalized vectors \ip w k) = P\ip w k), 
defining the corresponding approximate maximum likelihood decision rule, and denoting 
T the corresponding Gram matrix, the upper bound (29) is modified to 

M\(W,X)<^Sv (E-f 1 / 2 ). 

By using the inequality 

2-2x 1/2 = (1 -x) + (1 -x 1/2 ) 2 < (1 -x) + (1 -x) 2 , x>0, (37) 
which follows from fl3~0|), we can obtain 

mf\(W,X) < JL {S p (E-l) + Sp (E - f) 2 } 
M k k^i 



Applying the random coding and using (^) and the properties fl3"5D, (|36D of the typical 
subspace, one gets for large n 

p{n, M) < 2TrS® n (I - P) + Tr{Sf n P) 2 < 2e + (M - i) e -"^(^)- 5 ], 

resulting in p(n, e nR ) —> for R < C — 5, and hence in the inequality C > C. 



\2. The quantum reliability function 



In classical information theory the coding theorem can be proved without resorting to 
typical sequences, by mere use of clever estimates for the error probability \\Gal 6&j . More- 
over, in this way one obtains the exponential rate of convergence for the error probability, 
the so called reliability function 



1 1 
E{R) = lim sup — log 



< R < C . 



(3? 



This puts us onto the idea of trying to obtain similar estimates in the quantum case. 
Theorem 2. For all M, n and < s < 1 



E inf \(W,X) < 2(M-1) 



x 



TrS. 



Proof. The first step of our argument is to remark that 

I- Sp(E - T 1 / 2 ) = |-(M - TrG 1/2 ). 

Consider two operator inequalities 

_2G 1/2 < -2G + 2G, 
-2G 1/2 < -2G + (G 2 -G). 



(39) 



(40) 



The first one is obvious, while the second follows from (|37|). Taking the expectation with 
respect to the probability distribution we get 



-2EGV»<-2EG+ \l^_ G) ■ 



By using (0), we obtain 



M 



EG = E ^ |^ fe >< Wl = ME|W >< Vvl = M5f , 



k=l 
M M 



M 



E(G 2 -G) = l^« fc >< ^MIVV >< Vvl - E X1 >< Vvl 

fe=i i=i fc=i 

= >< WlVv >< Vvl = M(M - 1) [Sff- 

Let{ej} be the orthonormal basis of eigenvectors, and Xj the corresponding eigenvalues 
of the operator S® n . Then 



-2(ej|EG 1/2 |ej) < -2MXj + MXj min (2, (M - l)Xj) . 
By using the inequality min{a, b} < a s b l ~ s , < s < 1, we get 

min (2, (M - l)Aj) < 2(M - 1) S A} , < s < 1 . 
Summing with respect to J and dividing by M, we get from (fTO|) 



Einf A(W,X) < 2(M- 1) S ^A^ +S = 2(M - 1) 



TrS. 



l+s 



, < s < 1. 



□ 



It is natural to introduce the function fi(ir, s) similar to analogous function in classical 
information theory (\ Gal 68j \, Ch. 5) 



//(tt,s) = -logTrS^ 



l+s 



By taking M = e nR , we obtain 



E(R)> sup ( sup ii(tt, s) - sR ) = E r (R). 

0<s<l V vr / 



(41) 



(42) 



In particular, it follows easily that 

C > sup//(7r,0) = C. 

7T 

Thus the rate C — 5 can be attained with the approximate maximum likelihood decision 
rule (p8|) , (|27D without even projecting onto the typical subspace. 

On the other hand, it appears possible to apply in the quantum case the "expurgation" 
technique from [ |GaJ 6£\ , Sec. 5.7, resulting in the bound 



E(R) > sup(sup /i(7r, s) — sR) = E ex (R), 

S>1 7T 



where 



fi(lC,s) = -slog^7T i 7r fe |('0 l #jfc)| ; 



The behavior of the lower bounds E r (R), E ex (R) can be studied by the methods of classical 



information theory, see \\Bur 97] , and is indeed similar to the classical picture, where E r (R) 
gives better bound for big rates R, while E ex (R) is better for small rates; in an intermediate 
region of rates the bounds E r (R), E ex (R) have common linear portion C — R, where 



C = SUp /i(7T, 1) = SUp /i(7T, 1) 



log inf TrS 12 . 



(43) 



Ban 98 



This means that the quantity fl34|) is a quantum analog of the "cutoff rate" 
concept widely used in practical applications of information theory. Figures 1, 2 present 
typical behavior of the functions /i, \x and Gallager's exponents E r , E ex (modeled from the 
binary quantum channel, see below). 



§3. The binary quantum channel 



Maximization of the bounds E r (ir, R), E ex (7r, R) over tt, which is a difficult problem 
even in the classical case, is still more difficult in quantum case. However, if the distribu- 
tion 7T° maximizing either /i(7r, s) or fl(ir, s) is the same for all s, then 

E r (R) = E r (7T°, R), E ex (R) = E ex (ir°, R). 

This is the case for the binary pure state channel. 

Let \ipo), IV'i) be two pure state vectors with K^ol^i)! = £, m two dimensional Hilbert 
space H. Consider the operator S n = (1 — 7r)S'o + 7rS'i , where in notations the distribution 
7r is identified with the probability of the letter 1. Its eigenvectors have the form |^ ) + 
ai|^>i) with some a. Therefore for its eigenvalues we get the equation 

((1 - 7r)|Vo)M)l + ttIViXV'iI) (M + = A (|Vo) + • 

Solving it, we find the eigenvalues 

Ai(vr) = 



1 - Jl - 4(1 - s 2 )tt(1 - tt) 



A 2 (^) = 2 



7T 



1+ V /1-4(1-£ 2 )7T(1 

It is easy to check that both functions 

/z(7r,s) = -log (A 1 (vr) 1+S + A 2 (7r) 1+S 

/2(vr, s) = -s log (vr 2 + (1 - tt) 2 + 2tt(1 - n)e 2/s 
are maximized by tc = 1/2. Denoting 

■l-e\ 1+s fl+e^ 1+s 



ti(s)=fi(l/2,s) 



log 



2/s 



fi(s) = /2(l/2, s) = -slog - + 
we get the following bound 

E(R) > fl(s R ) - s R R, 0<R< #'(1); 
E{R)>C-R, fl'(l)<R<iJ,'(l); 
E{R) > fi(s R ) - s R R, h'(1) <R<C, 

where Sr, sr are solutions of the equations //'(sr) = R, — R, an d 



C = Mi) = A(i) = -lo§ 



/2'(1)=/2(1) + 



e 2 loge 2 
1 + e 2 



A*'(l) 



C = //(0) 



(1 



log 



¥ +(1 + ^) 2 



log (^) 



2 1 



1 - e 



log 



+ 



1 + e 



log 



l + e 



2/°V2/ V2/°V2 

The maximal amount of information C\ obtainable with non-entangled (product) mea- 
surements is attained for the uniform input probability distribution (jr = 1/2) and the 
corresponding Bayes (maximum likelihood) decision rule given by the orthonormal basis 
in 7i oriented symmetrically with respect to vectors |V>o)j l^i) ( which in this particular 
case coincides with Q2TD) [ |Lev 93^ , [ |(Jsa 98^ . It is equal to the capacity of classical binary 
symmetric channel with the error probability (1 — y/1 — s 2 )/2, that is 



1 + VT^) iog(i + VT^) + (i - v / I^i 2 ) io g (i - VT^) 



A comparison on this quantity with C shows that C\ < C for < e < 1 (although 
the difference between the two functions is quite small, see Fig. 3). Since C > C, this 
implies strict superadditivity property C n > nC\ for the binary pure state channel with 
< e < 1. However finding explicit quantum block codes realizing the potential of strict 
superadditivity seems to be a difficult problem, see \Sas 97\ . 



§4. General signal states with finite entropy 



The general case is substantially more complicated already on the level of quantum 
Bayes problem; in particular, so far no upper bound for the average error probability 
is known, generalizing appropriately the geometrically simple bound (fffi). The proof 
given in \\Hol 96|| (see also \\Sch 97f ) is based rather on a noncommutative generalization 
of the idea of "jointly typical" sequences in classical theory \\Cov 91\\ . This is realized by 
substituting in the average error probability (EH) the decision rule 



M M 

i=i i=i 



(44) 



where P is the projector onto the typical subspace of 

Sjt = 53 T^iSi, 

and P w k is a proper generalization of the typical projection for the density operators 
S w k . Namely, we choose P w k to be the spectral projection of S w k corresponding to the 
eigenvalues Aj in the interval (e~ n '- H '*( s (-^ + ^ , e"^^-)) -5 !). The essential properties of P w k 
are 

P w k < S w ke n[S ^ )+5] , (45) 

ETr S w k (I - P w k) < e. (46) 

The operator {J2iL\ PPw'P)' 1 ^ 2 is to be understood as generalized inverse of {J2aL\ PPw^) 1 ^ 2 ^ 
equal to on the null subspace of that operator, which contains range of the projector 
I — P. Denoting P the projection onto the range of E^i PP w iP, we have 

PP w iP <P<P, I — 1, ... ,M. (47) 

The proof given below is somewhat more direct than that in \\Hol 96j , \]Sch 9/f] , making 
no use of eigenvectors and spectral decompositions of the signal density operators S w k. 

Theorem 3. The capacity of a c-q channel i — > Si satisfying the condition ( |2^ ) is 
given by 

C = C = sup AH (n). (48) 

7T 

Proof. We shall assume that the supremum is finite, otherwise the modification is 
obvious. In view of the argument in §11.3 we have only to show that 

p(n, e nR ) ^ for R < C. (49) 

To avoid cumbersome notations, we shall further enumerate words by the variable w 
omitting the index k. 

By denoting A w = P w P(J2w'=i PP-w'P) 1 ^ 2 and using the inequality 



we obtain 



iTrS^-A^I <J TtS w A. w A. w , 



i m 9 M 

\{W,X) < — ^ [1 - \TrS w A w \ 2 } < — 53 [1 - TrS w A 

10 = 1 w = l 



where TvS w A w = TtPS w P w P(J2%=i PP w 'Py 1/2 is real number between and 1. Apply- 
ing inequality 

-2aT 1/2 < -3 + x, x>0, 
which follows from fl37|) , we obtain by ([47|) 

M M M 

-2(5] PP W/ P)- 1/2 < -3P+ 5] PP W/ P < -3PP W P+ Pp W ' R 

w'=l w'=l w'=l 

Hence 

i M M 

A(W, X) < — 5] \2TxS w - 3TrS w P w PP w P + £ TrS w P w PP w ,P] 

M w=l w'=l 



-i M 

= j- i J21 2TtS M-P w PP w P)+ E TrS w P w PP w ,P]. 

w=l w f :w'^w 

Taking into account that 

TtS w (I ~ P W PP W P) = TrS w (J - P W )PP W P + TtS w (I - P)P W ~ Tr^(J - P)P W (I - P) + 
TrS w (I - P W )P + TrS w (I -P)< 2[FrS w (I - P w ) + TrS w (I ~ P)], 



we can write 



i M 



inf A(W, X)<-^ {4TrS w {I - P) + 4TrS w {I - P w ) + £ TrPS w PP w ,}, (50) 

w=l w':w'^=w 

which is our final basic bound. 

We now again apply the Shannon's random coding scheme, assuming that the words 
w 1 , w M are chosen at random, independently and with the probability distribution (|32|) 
for each word. Then similarly to fl5B| ) ES W = Sf n , where S n = Y^ieA^i^i^ an d from 
by independence of S W ,P W >, 

E inf X(W, X) < ATrSf n {I - P) + 4ETrS u ,(/ - + (M - ljTr^PEP^. 



By the inequalities (f3(J), ( |46l) expressing typicality of the projectors P,P W , and by the 
properties of trace, 

EinfA(W,X) < 8e+(M-l)||£® n P||TrEP^, 



for n > n(ic, e, 5). By the property (poj) of P, 

||^8mp|i e -n[H(5^-)-<5] 



and by the property (g^) of P w , 

TrEP w , = ETrP w i < ETyS w > ■ e "W-))+ 5 ] = e n[R{s " )+5] . 

Thus 

Einf X(W,X) < 8e + (M - l)e- n[H( -^ ) - R( - s ^ ) - 25] . 

Let us choose the distribution n = 7r° such that AH(tt°) > C — 5. Then 

p{n, M) < 8e + (M — l) e - n[ °- 3S] (51) 

for n > n(ir , e, 5). Thus p(n,e n ^-^) -> asn-> oo, whence ( f4"U| ) follows. □ 

For quasiclassical channel where the signal states are given by commuting density 
operators Si one can use the classical bound of Theorem 5.6.1 \Gal 6S\ with transition 
probabilities S(ou\i), where S(u\i) are the eigenvalues of Si. In terms of the density 
operators it takes the form 

, 1 l+s\ n 



Einf X(W,X) < inf (M-lY Tr 



-i<=A 



(52) 



The righthand side of fl52|) is meaningful for arbitrary density operators, which gives a hope 
that this estimate, with some modification, could be generalized to the noncommutative 
case (note that for pure states Si Theorem 1 gives twice the expression (|52"D). This would 
not only give a different proof of Theorem 3, but also a lower bound for the quantum 
reliability function in the case of general signal states, possibly with infinite entropy. 



IV. Quantum channels with constrained inputs 
§1. The case of discrete alphabet 

Importance of quantum channels with constrained inputs was clear from the begin- 
ning of quantum communication; the question "How little energy is needed to send a 
bit?" is formulated more precisely as calculation of the capacity of quantum channel with 
constrained input energy (see | |Gor 5j| , \ \Leb 63, ~UB} , ^Bow 6'7j |, | |C'av 94^ for more physical 
discussion on that point). 

We first consider the case of discrete alphabet A = {i}. Let f(i) be a nonnegative func- 
tion defined on the input alphabet. We shall consider the class V\ of input distributions 
it satisfying the condition 

£/(iM0<#, (53) 

i 

where E is a real number. 

We put the additive constraint onto the input words w = (ii, ...,i n ) by asking 

f(h) + ... + f(i n ) < nE, (54) 

and denote by V n the class of probability distributions satisfying the corresponding con- 
dition 

E if fa) + ■■■ + ffa)]*fa, ■ ■ ■ , in) < nE. (55) 

ii,...,i n 

Now the quantities C,C can be defined as in §11.2 with the modification that the 
suprema with respect to tt are taken over V n , that is 

C = lim C n /n, C = lim C n /n, 

n— >oo n^oo 

where 

C n = sup sup/ n (7r,X), C n = sup AH n (n). 

TreVn x nev n 

and J n (vr, X), AH n (jr) are the analogues of the mutual information (Q) and the entropy 
bound (|) for the product c-q channel in 7i® n . 

Let us remark that just as it was in the case of unconstrained inputs, the sequence C n 
is additive and hence 

C = sup AH (n). (56) 

Indeed, it is sufficient to check that 

C n < nCt. (57) 
By the subadditivity of quantum entropy with respect to tensor products, 

n 

AH n (n)<Y,^H(7T^), 

k=l 



where is the k-th marginal distribution of n on A. Also 



k=l 



where n = - J2k=i n ^ k \ since AH (it) is concave function of tc ydol 73[ .The inequality 
can be rewritten as 

1 n 

-EEM> (fc) fe)<£, 



fc=l l k 



which implies that 7r G Pi if 7r G V n . Taking supremum with respect to 7r G V n proves 



Theorem 4. The capacity of a c-q channel i — ► Si satisfying the condition ( |2^ ) with 
the input constraint is equai to fl56[). 



Proof. We have to show that if C < oo and R > C , then p(n,e nR ) -/-> 0, and if the 
condition (pip holds and R < C , then p(n, e ni? ) — > . 

The proof of the first statement (the converse coding theorem) is based on the following 
modification of the inequality (p3|) 

log M ■ (1 -p(n,M)) < sup sup / n (7r, X) + 1, (58) 

Let again as in the proof of Theorem 1 J be the classical random variable describing the 
output of the product channel under the decision rule X if the words in the code (W, X) 
are taken with the input distribution ttm assigning equal probability 1/M to each word. 
Consider Fano inequality. Since the words in the code satisfy (|54"D , we have tt m G V n , and 
(|58]) follows by taking supremum with respect to (W, X). Substituting M = e nR gives the 
first statement of the Theorem. 

In the classical information theory direct coding theorems for channels with additive 
constraints are proved by using random coding with probability distribution (|32|) modified 
with a factor concentrated on words, for which the constraint holds close to the equality 
Gal 68\ , Sec. 7.3. The same tool can be applied to quantum channels \\Hol 97\\ . Let 



Ti be a distribution satisfying (0), and let P be a distribution on the set of M words, 
under which the words are independent and have the probability distribution (|32|). Let 
v n = P(~ J2t=i /( ? 'fc) — E) an d define the modified distribution under which the words 
are still independent but 

Let us remark that since tx G Pi, then Ef < E (where E (E) is the expectation corre- 
sponding to P (P)) and hence by the central limit theorem 

lim v n > 1/2. 

Therefore E^ < 2 m E^ for any nonnegative random variable £ depending on m words. 
For the error probability (|25"D we have the basic upper bound (|50D. Take the expecta- 



tion of this bound with respect to P. Since every summand in the right hand side of (|50|) 
depends no more than on two different words, we have 

Einf X(W,X) < 4Einf X(W,X), 



and the expectation with respect to P can be made arbitrarily small provided M = 
e nR ,n — > oo, with R < C — 35. Thus EA also can be made arbitrarily small under the 
same circumstances. Since the distribution P is concentrated on words satisfying fl54|), 
we can choose a code for which X(W,X) can be made arbitrarily small. □ 



§2. The case of continuous alphabet 

In this section we take as the input alphabet A arbitrary Borel subset in a finite- 
dimensional Euclidean space 8. 

We assume that a nonnegative Borel function / on £ is fixed and consider the set V\ 
of probability measures n on A satisfying 

/ f(x)ir{dx) < E. (60) 

J A 

We impose the additive constraint onto transmitted words w = (xi, x n ) by requiring 

f( Xl ) + ... + f(x n )<E. (61) 

Like in the classical case, we discretize the channel by taking apriori distributions with 
discrete supports 

7r(dx) = ^2iriS Xi (dx), (62) 

i 

where {x{\ C A is arbitrary countable collection of points and 

5 X (B) = j 1' I X % B U 
y 1 \ 0, if x g B, 

For 7r of the form (|62D the condition (|60D takes the form (0). Denoting V[ the class of all 
such probability distributions, we can directly extend the argument of Theorem 4, with 
the capacity given by 

C = C = sup AH(ir). (63) 

We now assume that the channel is given by weakly continuous mapping x — » S x 
from the input alphabet A to the set of density operators in TC. (The weak continuity 
means continuity of all matrix elements S x <f>)]ip,<p G TC). For arbitrary 7r consider the 
quantity 

AH(n) = f H(S x ;S n )7r(dx) } (64) 

J A 

where 

S w = S x 7r(dx). (65) 



Because of the weak continuity of the function S x the integral is well defined and rep- 
resents a density operator in TC. Moreover, the nonnegative function H(S X ; S n ) is lower 
semi continuous (see \ Weh 78^ ), and hence the integral in ( |B~4|) is also well defined . 
We also introduce the analog of the condition 



sup H(S X ) < oo. (66) 

xeA 



Under this condition the representation (H) holds with 



H(S { . 



H(S x )ir(dx). 



Proposition 2. Under the assumption that the mapping x — > S x is weakly continuous, 
the function f is lower semicontinuous, and the condition (JEW holds, 

C = sup AF(tt). (67) 

Proof. In view of (^3]) we have only to show that 

sup AH (it) > sup AH (n). 

It is sufficient to construct, for arbitrary ir e V\, a sequence of ir® g V[ such that 

lim inf AH(n m ) > AH(tt). (68) 

i— >oo 

To this end for any / = 1, ... we consider the division of A into disjoint subsets 



3^ = {x : k/l < H(S X ) <{k + 1)//}, k 



...,-1,0,1,... . 



(69) 



By making, if necessary, a finer subdivision, we can always assume that diameters of all 
sets B® are bounded from above by t\, where e\ — > as I — > oo. Let xj^ be a point at 
which f(x) achieves its minimum on the closure of B%\ and define 



nW(dx) = Y,7r(B% ) )5 xil) (dx), 
k k 

where 7r is a fixed distribution from V\. Then 

j{x)^\dx) < [ f(x)ir(dx), 



hence it® E V[. 
By construction 



(70) 



[T0| ) we have 
H{S x )it®{dx) 



H(S x )ir(dx) 



< l/l 



Let us show that 



lim mf H ^S x n {l) {dx)^j > H (J s x ir{dx) 



(71) 



(72) 



To this end we remark that due to the weak continuity and uniform boundedness of 
the function S x , the operators f A S x ir^(dx) weakly converge to the operator f A S x %(dx). 
Indeed, let B r be the ball of radius c in £. Then 



S x ir(dx)ip) 



<E 

k 



|(0|5(oV> -(<f>\S x ^)\ir{dx) + 2 



n(A\B c ). 



By choosing first c large enough to make the second term small, we can make the first 
term small for all large enough I since (4>\S x ip) is uniformly continuous on A(lB c and the 
diameters of Bjp uniformly tend to zero. The relation ( |72|) then follows from the lower 
semicontinuity of the quantum entropy. Relations (]TT|) , (|72|) imply fl68|). □ 



§3. The upper bounds for error probability 

A much more detailed information concerning the rate of convergence of the error 
probability can be obtained for pure state channels, by modifying the estimates from 
§111.2 to channels with infinite alphabets and constrained inputs following the method of 



Gal 68j\ , Ch. 7. We start with the case of discrete alphabet A. 



Let Si = \ipi)(ipi\ be the pure signal states of the channel, and let 7r be an apriori 
distribution satisfying the condition (f^|). Then the following random coding bound holds 
for the error probability p(n, M) where M = e nR with R < C: 



e^ 2 



p{n,e nR ) < 2 I — ] exp{-n[fi(7T,s,p) - sR}}, (73) 

V U n,5 , 



(74) 



where ^ 

/x(7r,s,p) = -logTr/^Tr^W-^^) , 

and < s < 1,0 < p, < <5 are arbitrary parameters. The quantity 

n 

v ntS = P(En -5<Y, /(**) < nE) 
k=i 

satisfies lim^oo \fnv n ^ > 0, thus adding only o{n) to the exponential in (|73|) . 

The bound ( |73]) is obtained in the same way as Theorem 2, that is by evaluating 
the expectation of the average error probability ( p6|) using random, independently chosen 
codewords, but with the modified codeword distribution 

*(«=(«, <„)) = { - (75, 

By using independence of the words, we can repeat the first part of the proof of Theorem 
2 to show that 



Now for any p > 0, 



-2 E 5 G 1/2 < -2E 5 G + \ _ Q) = —2ME S S W + I l){E 5 S w f 



E S S W <(— ) Eex V {pf2[f(i k )-E]}S w =(—) fee^W-^sY = £. 
V"™.*/ fc=i \*W W / 

Then we obtain for < s < 1 

-^(M- E 5 G 1/2 ) < 2(M - l) s TrS 1+s < 2 (^-^ f Tr | II e^^"^^^ | 



whence the bound ([73]) follows 



In the same way, the expurgated bound from §111.2 can be modified to obtain 

2 2e pS 

p(n,e nR ) < exp{-n[jl(ir,s,p) - s(R+ - log )]}, (76) 

where 

fi(ir,8,p) = -slog J2 Wk<? m+m - 2E] \(m k )\ 2/s . (77) 

These bounds can be extended to pure state channels with continuous alphabets by 
using technique of §2 to obtain ([73]), ( |76|) with 

fi(n,s,P) = -logTr j^e^-^^vr^x)} 1 ^, (78) 
fi(ic,s,p) = -slog I [ e^ + ^- 2£ ]|(^|^)| 2 /%(dx)7r(^). (79) 

J A J A 

Introducing the reliability function fl38|) , we get the lower bound for E(R): 

E{R) > ma,x{E r (R),E ex (R)}, 

where 

E r (R) = sup (sup sup /i(7r, s,p) — sR), (80) 

0<s<l 0<p ttG^i 

E ex (R) = sup(sup sup /i(7r, s,p) — sR). (81) 

l<s 0<p neVi 

An example where the maximization at least partially can be performed analytically will 
be considered in §V.2. 

V. Quantum Gaussian channels 

§1. Gaussian memoryless channel in one mode. 

For a simple introduction to Gaussian states see e. g. \\Hol 8(\\ , Ch. V. To make 
presentation self-contained, we include proofs of some well known results (such as Lemma 
1 below). 

Let A be the complex plane C, and let for every a G C the density operator S a 
describe the thermal state of harmonic oscillator with the signal amplitude a and the 
mean number of the noise quanta N, i. e. 

S a = ^Jex P (-^^\z)( Z \d\ (82) 

where \z) are the coherent state vectors. By introducing the creation - annihilation 
operators a' , a, we have 

TrS'o.a = a, TrS^a^a = N + \a\ 2 . (83) 

We remind that 

S a = V(a)S V(a)*, 



where 

V(a) = exp(<W — aa) 

are the unitary displacement operators, and the operator S has the spectral representa- 
tion: 

1 ~ / N \ n 

^fttSGvti) i»>w- < 84 > 

where \n) are the eigenvectors of the number operator a^a, corresponding to eigenvalues 
n = 0, 1, Hence the states fl32| ) all have the same entropy 

H(S a ) = (N + l)\og(N + 1) - NlogN = g(N), (85) 

where g(x) is continuous concave monotonously increasing function of x > 0. It is well 
known that the states S a have maximal entropy among all states satisfying (|83|). This 
follows from 

Lemma 1. The operator (8i[) has maximal entropy among all density operators S 
satisfying 

TrSa^a < N. (86) 

Proof. Denote 5" the density operator which is diagonal in the basis {|n}} with the 
elements s n = (n\S\n) on the diagonal. This operator satisfies (|S6|), and 

H{S')-H{S) =H(S;S') > 0. 

Therefore the maximum is achieved on the diagonal operators. One must maximize 
H(S) = — X)n s n hi s n under the conditions s n > 0, J2 n s n = 1, and (BB|) which becomes 
Hn ns n < N, and the solution fl34|) follows by application of the Lagrange method. □ 

Operator fl84]) satisfies the conditions (|83|) (with a = 0) therefore it also has the 
maximal entropy among such density operators. 

Let us consider the channel a S a which is quantum analog of memoryless channel 
with additive Gaussian noise (see \\Gor 64j\ , \\Hel 7^| , \\tiol 7/f| ). The condition ( |66D is 
trivially satisfied, and for any input distribution n(d 2 a) 

AH(7r)=H(S n )-H(S ), 

where 

= S a -n{d 2 a). 



By choosing f(a) = \a\ 2 , we impose the additive constraint of the type (^Tj). Thus V\ is 
defined as 

I \a\ 2 n(d 2 a) < E, (87) 

where ir(d 2 a) is an input distribution. In fact, E is the "mean number of quanta" in the 
signal, which is proportional to energy for one mode. The constraint (|8"TD by virtue of 
(H|) implies 

Tr S n a j a<N + E. (88) 
According to Lemma 1 the maximal entropy 



H(S 7T )=g(N + E) 



(89) 



is attained by Gaussian density operator 

l r ( U 



corresponding to the optimal distribution 



1 ( \a 



2' 



ir(d 2 a) = — ^— exp I —^7- I d 2 a. (91) 

By Proposition 2 the capacity of the memoryless Gaussian channel is given by 

C = C = g(N + E) - g(N) 

= "* (1 + atTi) + {N + E) log ( l + ^Te) ~ N,og ( l + 7f 



This quantity was anticipated in \\Gor 64\ (relation (4.28)) as an upper bound for the 



information transmitted by the quantum Gaussian channel. On the other hand, for a long 
time this quantity was also known as the capacity of the "narrow band photon channel" 
Gor 62j , \ Leb 63, "66| , \\Cav 94j\ - Our argument based on Theorem 4 and Proposition 2 



gives for the first time the proof of the asymptotic equivalence, in the sense of information 
capacity, of the memoryless Gaussian channel with the energy constraint to this quasiclas- 
sical channel. To make the point clear, we give below a simplified one-mode description 
of the photon channel. 

Consider the discrete family of states 

S m = P{m)S P{rn)*, m = 0, 1, . . . , (92) 

where P(m) is energy shift operator satisfying P(m)\n) — \n + m). Notice that P(m) = 
P m , where P is isometric operator adjoint to the quantum-mechanical "phase operator" 
\\Hol 80j\ . The states S m all have the same entropy (|85f) as the states S a , and the mean 
number of quanta 

trS m a)a = N + m. (93) 

Moreover, all states ( p2|) are diagonal in the number representation, so the channel is 
quasiclassical. 

Imposing the constraint 

00 

Y,mn m <E, (94) 

m=0 

where 7r m is an input distribution, and introducing the density operator 

00 



m=0 



by virtue of (|93|), we obtain the same constraint ( [S8D for the new operator S'^. The 
maximal entropy ( [59] ) is again attained by the operator (|9U[), which has the spectral 
representation 



It corresponds to the optimal signal distribution 



Leb 63, 66|| 



N 



71",. 



N + E 



$m0 + 



E 



N + E 



N+E+l KN + E + l 



N + E 



There is notable difference between the case of pure state channel as opposed to the 
general case. For a pure state case (where N — 0), one can formulate a broader problem 
of finding a maximum capacity channel x — > S x with arbitrary alphabet {x} and a signal 
distribution ir(dx) satisfying the output constraint 

TtStt a' a < E. 



This was done in \\Yue 93\\ where it was shown that the noiseless photon channel provides 
a solution to this problem. In view of the result of [ [Hau 9(\\ , any other pure state channel 
satisfying 

S x ix(dx) = - ; - ( -ETTt ) \ n )( n \ 



E + 



n=0 



E + l 



gives, asymptotically, a solution to the same problem. However, in the general case impos- 
ing the output constraint (|88| ) instead of the input constraints ( 87|) or (|94| ) looks rather 
artificial; the equivalence of these constraints for apparently different channels seems to 
be a very special feature of the quantum Gaussian density operators. 



§2. The reliability function of Gaussian pure state channel 

We are going to apply results of §IV.3 to the Gaussian pure state channel a — > S a — 
\ct)(a\ with the constraint (0). By taking the optimal apriori distribution (|9l|) we can 
calculate explicitly the functions (ff8f), (fT9|). 

Namely, to calculate (|78|), we remark that 

r a—pE i r | ,2 

/^P- W ,) = I iL_-L/e^|,>(^ 2 

E' 



-pE 



7 + 1^Ae' + 1 1 A 1 



1-pE E' + l^L, 

where E' = E/(l — pE), provided p < E* 1 , and the trace of the (1 + s)-th power of this 
operator is easily calculated to yield 



//(tt, s, p) = (1 + s)pE + log[(l + E — pE) 1+s - E 1+s 
By taking into account that 



(96) 



-\z-w\ 2 



see, e. g. \\H.eI 76\\ ), we can calculate the integral in (|79|) as 



-2pE 



(irEy 



expi-KE- 1 + s _1 -p)\z\ 2 + (E- 1 + s~ l -p)\w\ 2 - 2s" 1 Re^]} 



-2pE 



1 + p 2 E 2 - 2pE - 2pE 2 /s + IE js ' 



for p < E 1 , whence 

/2(7T,s,p) = s{2pE + log[l +p 2 E 2 - 2pE + 2E{1 - pE)/s]}. (97) 
Trying to maximize /j,(tt, s,p) with respect to p we obtain the equation 

(l + E-pE) s (l-p) = E S , (98) 
which can be solved explicitly only for s = 0, 1. Thus, contrary to the classical case 



Gal 68\ , the maximum in ( p0|) in general can be found only numerically. For s = we 



have p = and 

C = ^/i(vr, 0, 0) = (E+ 1) log(£ + 1) - E log E. 

For s = 1 equation ( |9lf ) has the unique solution p(l,E) = 1 + 1/E — q(E)/E < E" 1 , 
where 

q(E) = i±^±I. 

For future use we find the important quantities 

Mtt, l,p(l, E)) = 2(E + 1- q(E)) + log q(E); 

^)= i+ l- ^ + i<Mg^i£ . (99) 

The optimization of the expurgated bound can be performed analytically. Taking 
partial derivative with respect to p we obtain the equation 

p2 - 2 K; + 2i) + ^ = °- 

the solution of which, satisfying p < E^ 1 , is 

p(s, E) = s- 1 + E- 1 - E^qiE/s). 

Substituting this in d8l|) , we obtain the following expression, which is to be maximized 
with respect to s > 1: 

/2(7r, s,p{s,E)) - sR = 2{E + s - sq{E / s)) + s\ogq{E/s) - sR. 

Taking derivative with respect to s, we obtain the equation 

q(E/s) = e R , 

the solution of which is 

E (100) 



y/e 2R — e R 



If this is less than 1, which is equivalent to 



R<\ogq(E) = ^(7r,l,p(l,E)), 



then the maximum is achieved for the value of s given by ( |100| ) and is equal to 



2E(1 - Vl - e- R ) = E ex (R) > E r (R), 

(which up to a factor coincides with the expurgated bound for classical Gaussian channel) . 
In the range 

— /2(7r,l,p(l,-E)) <R<g- s fi(rr,l,p(l,E)), 
where the optimizing s is equal to 1, we have the linear bound 

E ex (R) = E r (R) = ju(7T, E)) — R, 

with the quantities J^M^r, l,p(l, E)), p,(ir, l,p(l, E)) defined by (f?5|). Finally, in the range 

^H^,\,p(\,E))<R<C 

we have E ex (R) < E r (R) with E r (R) given implicitly by (§0|). 

On the other hand, for the pure state photon channel the analysis of the error proba- 
bility is trivial: since this is quasiclassical noiseless channel, the error probability is zero 
for R < C. Thus, although the two channels are asymptotically equivalent in the sense 
of capacity, their finer asymptotic properties are apparently essentially different. 

§3. The case of many modes 

Consider now a finite collection of harmonic oscillators with frequencies uij, described 
by creation-annihilation operators a],a-,;j = 1, ... . Let a 3 - G C and let Sj(aj) be the 
Gaussian state flS2|) with 

Tr Sj(otj)a,j = atj, TrSj (atj )a]aj = Nj + |aj| 2 . (101) 

Denoting a = (atj) we consider the tensor product states 

S a = ^jSjiaj), (102) 

and we are interested in the memoryless quantum Gaussian channel a — > S a satisfying 
the additive constraint (0), where / is the energy of the signal 



Note that according to flSSp, the entropy of the states ( |102| ) is equal to 

H(S a ) =H(S ) = J2 H (SM) =T,9(Ni). (103) 



We shall denote by 



N M = x—r ( 104 ) 



the Planck distribution which maximizes the entropy (|103| ) under the constraint 

TiuiiNj < const, 



and remark that 



9iW)) = 9 I ^73T = " log(l " (105) 



Let V\ be the set of probability distributions Tc(d 2 a) satisfying 

n(d 2 a) < E. (106) 



We also use the notation (y) + = max(y, 0). 

Proposition 3. The capacity of the memoryless quantum Gaussian channel with the 
additive constraint onto the signal energy is equal to 

C = 52(9(^(9)) -giNj)^, (107) 

3 

where 9 is chosen such that 

Y,nu> j (N j (9)-N j ) + = E. (108) 

3 

Proof. Proposition 2 applies, so we have only to calculate supremum in the right hand 
side of (|67|) with V\ given by (|106|) . Let us show that it is achieved on the Gaussian 
probability distribution 

n(d 2 a) =exp (-^M^ I Y[d 2 aj, (109) 

where 

m* = (Nj(9) - Nj) + , (110) 

and 9 is chosen such that 

^2 hajjirij = E. (Ill) 

3 

(If m* = 0, we have in mind in (|109| ) the Gaussian distribution degenerated at 0.) 

Since AH(ir) = H(S W ) — H(S ), we have by (|103|) and subadditivity of quantum 
entropy 

A#(tt) < ^A/J(tt (j) ), (112) 

3 

where 7r^ are the marginal distributions of it. 

Let us denote \oij\ 2 = rrij. We first maximize with respect to ir^ satisfying 



hujj | dj 1 [d 2 dj ) = rrij, 
and then with respect to rrij satisfying 



m j — 0' ^ TiUjUij < E. (H3) 



According to §1, the first maximization is achieved by the Gaussian probability distribu- 
tion 



■K^\d 2 aj) = exp 



a i 



d a,- 



rrij 



We can then take 



7i(d 2 a) = l[7r (j \d 2 



a 



:>)■ 



for which equality holds in the second relation of ( |113[ ). For such 



7T 



Atf (tt) = \9(Nj + 



g(Nj)} . 



(114) 



We have thus arrived at maximizing ( |114| ) under the constraint (|113|) , which is similar 
to the problem of finding the capacity of quasiclassical multimode photon channel, con- 
sidered in |[Leb 63, ~66\ . The Kuhn- Tucker conditions for the solution m* of this problem 
imply equations ( |11U| ), ( |111| ), and the last one is the same as ( |108| ). □ 

In the case of oscillators in thermal equilibrium [[Leb 63, 66\\ , [ [Bow 67 , \\Cav 94\ , Nj 
themselves are given by the Planck distribution 



where dp is such that 



N = NA6 P ) 



1 



p 



is the mean energy of the oscillator noise. The entropy (|103|) of the signal states is 



s(P) = Es(A^p)) = E{; 



Ophujj 

t 6phujj 



logfl 



-Bp hu)j 



In this case the formula ( 107 ) becomes 

C = s(P + E) - s(P). (115) 
In particular for the pure state Gaussian channel, where Nj ■ = 0, P = one has 

C = s{E). (116) 



§4. The quantum Gaussian waveform channel 

We now pass to consideration of more realistic dynamical model of the Gaussian chan- 
nel - that of the waveform channel. In classical information theory the waveform channel 
is treated by reduction to parallel channels, i. e. by decomposing the Gaussian stochastic 
process into independent one- dimensional modes. In quantum theory such a decomposi- 
tion plays an additional role as a tool for quantization of the classical stochastic process. 
The partly heuristic procedure described below is an analog of classical decomposition 
into harmonic modes ( ||GaJ 68j\ , §8.3). A mathematical formulation in terms of a quantum 



stochastic process avoiding this procedure is possible (see the end of this Section), but 
rigorous calculation of the capacity based on this formulation remains an open problem. 
Let us consider the periodic operator-valued function 



X(t) = £ \FP- + a}e**') , t G [0,T], (117) 



where [0,T] is the observation interval, 

Uj = ^jT, J = 1,2,... (118) 

are the oscillator frequencies and the creation-annihilation operators. In quan- 

tum electrodynamics a similar relation represents a component of the electric field in 
a planar wave with periodic boundary conditions on finite interval (see e.g. [|HeJ 76j ). 
To avoid problems related to infinite degrees of freedom, we shall restrict summation 
in (|117 ) to a finite range It which will grow with T. In the band-limited case, where 
0<^;<co><c<)<oo, we can put 

It — {j '■ < Wj — 

In the case of infinite frequency band (where u — 0, u> = oo), we shall take 

It = {j ■ u T < Uj < io T }, 

where lo_ t | 0, ujt T °°- 
We have 

h Io x{t)2dt = + 1) (119) 

for the corresponding energy operator. 

We assume that the mode a,j is described by the Gaussian state Sj(ctj) with the first 
two moments given by (|101|) , so that the whole process X(t) is characterized by the 
product Gaussian state ( |l02p , such that 

TrS a X(t) =a(t), 

TrS Q - jf X{tfdt = to, [Ni + - J + — jf «(t) 2 rft, (120) 

where 

a: 



W = EV^(^-^ + «^), (i2i) 



i- £a(t) 2 dt = Y,^j\<Xi\ 2 - (122) 

The signal is thus represented by the real function a(t) and the mean power constraint 
on the signal is as follows 

— / a(tfdt < TE. (123) 

47T JO 



A code (W, X) for such a channel is a collection Xi), (a M (-), Xm), where 

a k (-) are functions of t e [0,T] representing different signals; one defines the capacity 
of the channel as supremum of values of R for which the infimum of the average error 
probability 

i M 

KW, X) = — E(l - TtSjXj). (124) 

with respect to all codes of the size M = e TR tends to zero as T — ► oo. 

Proposition 4. Assume that Nj = N(ujj), where N(u) is a continuous function. The 
capacity of the Gaussian waveform channel as defined above exists and is equal to 

C = ^~ f W (9(N (u)) - g(N{u))) + du, (125) 

where 



and 9 is chosen such that 

1 

27 



= (126) 



huj(N e (u) - N{uo)) + du = E. (127) 



Proof. We start with considering the band-limited case and first prove that inf^x A(W / , 
X) -U for R > C . From the Fano inequality of the type fl5"8|) and Proposition 3 we have 

TR ■ (1 - mf X(W, X)) <C T + 1, (128) 

where 

C T = max [g(N 3 + rrij) - g(Nj)l (129) 
jei T 

and the maximum is taken over the set 

m j > 0, — ^ hujirijAuj < E, 

with 

2vr . , 

Auj, = — . (130) 

Introducing the piecewise constant function 

N T (u>) = Nj, < u < Uj, 

we have 

T 2tt M(v,u>) Jw_ 



C T 1 

<— max / [p(AT T (ci;) + m(c<A) — gr(JVr(o;))]da;, 



where 



1 f a 

Ai(u,Ld) = {m(-) : m(w) > 0; — / fiu>m(u)duj < E}. 



Since N(u) is continuous, it is uniformly continuous on and therefore Nt{ui) tends 

uniformly to N(u) as T — > oo. It follows that 

limsupr^— < max — / \g{N{u) + m(w)) - g{N{u))]du>. 

1 M(ul,uj) Z7T Jli 



However, this maximum is achieved on 

m*(w) = (N e (u)-N(u))+, (131) 

and is equal to C as defined by (|125|) - (|127|) . Therefore from ( |128| ) we conclude that 
inf^x X(W, X) y4 for R > C. 

We now show that the average error probability tends to zero if R < C. Let 7r(d 2 a) 
be the Gaussian probability distribution ( |109| ), where m* are given by ( |110| ) with 9 = 9 T 
chosen such that 

i- £ HS^i = E. (132) 

Applying the basic bound ([50]) with the word length n = 1 and with 5 replaced by ST, 
we have 

miX(W,X)< (133) 

1 M { 

< E 4TrS Qj (J - P) + 4Tr^(J - P a .) + £ TrPS aJ PP a , 

M 3=1 { k^j 

where P is the spectral projection of S n corresponding to the eigenvalues in the range 

e -[H(S v )-6T\^ an( j j g S p ec tral projection of corresponding to the eigenvalues 
in the range [q~^ h( - s (-)) +5t \ e -l H ( s (-))- ST }y Since S a are unitary equivalent to Sq, then 
H(S(.)) = H(S ) and the last term in (|133|) is simply 

TrS (/-P ), (134) 

which is similar to 

Tr^(J-P). (135) 

We wish to estimate the terms ( |134j ), (|135|) for the Gaussian density operators So, SV 
For definiteness let us take ( |134| ). We have 

Tr S (J - P ) = Pr {| - log A ( . } - H(S )\ > ST} , (136) 

where Pr is the distribution of eigenvalues A(.) of S . By Chebyshev inequality, this is less 
or equal to D(log A(.))/5 2 T 2 . Now D(logA(.)) = J2j Dj(log A(.)), where Dj is the variance 
of logA(.) for the j-th mode. From ([84]) we see that the eigenvalues of Sj(0) are 

m 

\J = I ■ n = n 1 



hence 



D,(logA ( .)) = £(-logA£ - H(S )) 2 Xi (137) 



n=0 



^^t(n-N i r w ^ I = F{ N i ), (138) 



where 



F(x) = x(x + 1) log 



2 ^ + 1 

X 



is a bounded function on (0, oo). Thus finally 

Trg (/-P )< E ^f j) , (139) 

and a similar estimate holds for Tr S n (I — P) with Nj replaced with Nj + m*. 

Now let the words a 1 ,..., a M be taken randomly with the joint probability distribution 
P defined similarly to (59) starting from the probability distribution P with respect to 
which the words are independent and have the same probability distribution 7t(d 2 a). Then 
E£ < 2 m E£ for any nonnegative random variable £ depending on m words. Therefore from 

o 

E inf \(W, X) < 8ETr^ 0) (/ - P) + 4TrS (J - P ) + E 4ETrPS' aW PP aW 

X M i=i { 

= 8TiS n {I - P) + 4Tr5 (J - P ) + 4(M - i) e -[A#W-2«n 

< ^mmsn + 4 WW) + 4 ( m - i)e-i— i. 

Since the function F(x) is bounded and the size of It is proportional to T, the sums in 
the first two terms have the order T, the terms themselves having the order T _1 . To 
complete the proof we have only to show that 

C 

liminf— > C. (140) 

T^oo T ~ 

Let m*(u) be the function ( |131| ), and let u/ be the point on the segment [oJj-i, Uj] at 
which it achieves its minimum, then 



— E hu\m* '{uj'a) < — / hum* {u)du = E, 

Z7T T~t 27T Jul 



hence 



CV . 1 



Since both N(u) and m*(u) are continuous, the last sum tends to 

'[g(N(u) + m») - g(N(u))]du = C, 



and the proof is completed. 

We now turn to the case of the infinite frequency band (0, oo). By applying argument 
similar to given above, one sees it is sufficient to show that 



T 

where 



CV 1 r°° 

lim _Z = C (0, oo) = max - / [g(N(u) + m{u)) - g(N(u))]du, (141) 

-^oo 1 M{Q,oo) Ztt Jo 



1 r°° 

M(0,oo) = {m(0 : m(u) > 0, — / hum(u)du < E}. (142) 

2n Jo 



The maximum is achieved on the function m*{u) of the form (|131|) , where 6 is such that 

1 r°° 

— / hum*(uj)duj = E. 
2Wo v ' 

Let us take < u < uj < 00. By omitting frequencies outside this band, we obtain 



liminf — > max — / \g(N(u) + m(uj)) - g(N(uj))]duj 
> -L r\g{N{u)+m\u))-g(N(u))]dw, 

Z7T Jlu 



since m*(-) G M.(lu, uj). Taking the limit as uj — > 0, uj — > 00, we prove the > part of ( |141| ). 
To prove the < part, we consider the relation 

% = J2i9(N, + m*) - g(Nj)]Aujj, (143) 
j 

where 

m *i — I ~ 7r~s 7 — -M> 

and #t is chosen in such a way that 

i-£^m*A^. = £. (144) 
.7 

By considering the piecewise constant functions 

N t (uj) = Nj, m T (uj) = m* for Uj-i < uj < ujj 

we can write the right hand side of ( |143| ) as 

1 r°° 

— / [g(N T (uj) + m T (uj)) - g(N T (uj))]duj 

Z7T JO 

= ±- r[g(N(uj) + m T (uj)) - g(N(uj))}duj 



2tt jo 
1 f 00 

+ — / [g(N T (uj) + m T (uj)) - g(N(u) + m T {u)) + g(N(uj) - g(N T (uj))}duj. 



2tt Jo 

Taking into account that 



we see that the first term is less or equal to 



1 r°° 1 

— / huimT{uj)duj < — hujjm*Auj = E, 
2ir Jo 2n ■ J 



1 



oc 



max — / \g(N(u) + m(uj)) - g(N(uj))]duj = C(0,oo). 

M(0,oo) Z7T Jo 

It remains to show that the second term tends to zero. We shall show it by using the 
Lebesgue dominated convergence theorem. Since N(u) is continuous, Nt(oo) — ► N(uj) and 
g(N T (uj)) — * g(N(uj)) pointwise. Next we observe that 6 T is separated from as T — ► 00, 



that is 9t > > 0. Indeed, assume that 9t I for some sequence T — > oo, then the 
sequence of continuous functions 



converges to oo uniformly in every interval < u < co < to < oo, which contradicts to 
the condition ( |144| ) . It follows that for any u> > the quantity 



N(u>) + m T (uj) = max _ v 



is bounded as T — > oo. Since g(u) is uniformly continuous on any bounded interval, it 
follows that 

g(N T (oj) + m T (uj)) - g(N(u) + m T (uj)) -> 0. 

It remains to show that the integrand is dominated by an integrable function. Taking 
into account that h"(x) < for x > 0, we deduce that g(x + y) — g(x) < g(y) for x,y > 0. 
Therefore the integrand is dominated by the function 2^(777.^(0;)). But 

. . 1 1 

mT\Uj) < —^-r < —r-r . 

1 y 1 — Q0 T hijj ^ — e 8 hLu 1 

Thus 

*<"*<<"» £ » (sBS^t) = ~JSr~i - '"8(1 - »-«"), 

which is positive integrable function. □ 

These formulas take especially nice form for the equilibrium noise when N(uj) = 
N ep (uj) = ( e ephuj - ly 1 with # P determined from 



2tt 7o e 9 ^ - 1 

By using the formula 



°° X d — ^ 
e x — 1 6 ' 



one finds 9p = Jn/12hP, and 



s(P) = ^ [°° g(N dp (uj))duj 
2ir Jo 



= J_ f \Jj±£- _ , og(1 _ e -»*) 1 ^ = = , (145) 

whence 

which coincides with the capacity of the infinite band photon channel calculated in 
Leb 63, 66} . For the pure state channel 

C(0,oo) = J||, (147) 



a formula found also in [Bow 67f 



Let us now formulate the problem more in the spirit of the treatment of Gaussian 
noise in classical information theory. In the limit T —>■ oo one expects that the periodic 
process ( |117| ) turns into the "signal + noise" process 



X(t) = a(t)+Y(t), t>0, 
where a is the classical signal and Y(t) is the quantum Gaussian noise 

Y(t) = J™ Vh^ (dA w e- lujt + dAle iut ) . (148) 

Here is the quantum Gaussian independent increment process having the commutator 

[dAu, dA[] = 5(lo - X)dudX, 

zero mean, and the correlation 

(dAldA x ) = 5(u - X)N(uj)dujdX 

(all other commutators and correlations vanish). 

By using this with ( |148| ) one obtains the noise commutator 



roo 

\Y(t),Y(s)]=2ih ou sin u(s-t)diJ = 2ihir6' {t - s) , (149) 
Jo 

and the noise correlation function 

(Y(t)Y(s)) = B(t-s) + K(t-s), (150) 

with 



roo 

B(t) = 2h / uoN(uo) cos ujtduj 
J 



and 

roo 



roo 

K{t) = h / ue-^dw = -h[r 2 - m8'{t)} 
Jo 



is the zero temperature correlation. 

The process X(t) is observed on the time interval [0, T] which means that one consid- 
ers the Gaussian (quasifree) state with mean a(t) and the correlation function ( |150| ) on 
the algebra of canonical commutation relation generated by X(t);t G [0,T], which is de- 
termined by the commutator ( |149|) (see e.g. |[HoJ 76|| ). One imposes the power constraint 



( |123| ) and defines the capacity in the same way as we have done before Proposition 4. The 
proof makes it plausible that the capacity of the "signal + noise" process is just one given 
by that Proposition. However an attempt to prove this following the classical method 
of reduction to parallel channels ||GaJ 68|| meets the following new difficulties. A minor 



problem is that the kernels ( |149| ), (|150|) are now generalized functions. More important 
is that in the classical case one has two quadratic forms: correlation and energy (which 
is just the L 2 inner product), that are simultaneously diagonalized by solving the inte- 
gral equation with the kernel ( |150| ). In quantum case one has additional skew-symmetric 
form - the commutator - which also should be transformed to a canonical representation 
allowing decomposition into parallel channels. However this is not always possible. The 



proof of Proposition 4 shows that in a sense this happens asymptotically (as T — ► oo), 
but a rigorous proof of that is still lacking. 



VI. Some open problems 



Several questions remain, some of which were mentioned in the text. Let us remind 
them adding few further problems and comments: 

1) Superadditivity of the entropy bound for general quantum channel 



Ben 96 



our 

conjecture is that if channels with this property at all exist, they might be found in a 
neighbourhood of the identity channel. The perturbation of the identity should be truly 
quantum and irreversible and small enough to enable neglecting probability of more than 
one error in the product channel. One then may try to find input states exhibiting the 
superadditivity property by using quantum codes correcting one error ||CaJ 96\\ , ||S'te 97\ ] 

2) Finding practical block codes with substantial gain from the strict superadditivity 
of the capacity [|Sas 9/f ; 

3) Exponential upper bound for error probability in c-q channel for general signal 
states, allowing for lower bound of the quantum reliability function, see |[Bur 97 \; 



4) Lower bound for error probability at least for pure state channel (an analog of 
sphere-packing bound), see \\Bur 97\ ; 

5) Consistent treatment of quantum Gaussian waveform channel as described at the 
end of §V.4. 

All these problems address transmission of classical information through quantum 
channels. There is yet "more quantum" domain of problems concerning reliable transmis- 
sion of entire quantum states under a given fidelity criterion \\Ben 97 ]. The very definition 
of the relevant "quantum information" is far from obvious. Important steps in this di- 
rection were made in 



Bar 97 



where in particular a tentative converse of the relevant 
coding theorem was suggested. However the proof of the corresponding direct theorem 
remains an open question. 
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